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X^f^ ■ Accurate forecasting of zero coupon bond yields for a continuum 

of maturities is paramount to bond portfolio management and deriva- 
tive security pricing. Yet a universal model for yield curve forecasting 
has been elusive, and prior attempts often resulted in a trade-off be- 
tween goodness of fit and consistency with economic theory. To ad- 
dress this, herein we propose a novel formulation which connects the 
dynamic factor model (DFM) framework with concepts from func- 
tional data analysis: a DFM with functional factor loading curves. 
This results in a model capable of forecasting functional time se- 
ries. Further, in the yield curve context we show that the model re- 
tains economic interpretation. Model estimation is achieved through 
an expectation-maximization algorithm, where the time series pa- 
rameters and factor loading curves are simultaneously estimated in 
a single step. Efficient computing is implemented and a data-driven 
smoothing parameter is nicely incorporated. We show that our model 
performs very well on forecasting actual yield data compared with ex- 
isting approaches, especially in regard to profit-based assessment for 
an innovative trading exercise. We further illustrate the viability of 
our model to applications outside of yield forecasting. 

1. Introduction. The yield curve is an instrument for portfolio man- 
agement and for pricing synthetic or derivative securities [Diebold and Li 
(2006)]. Bond prices are hypothesized to be a function of an underlying con- 
tinuum of yields as a function of maturity, known as the yield curve. Our 
contribution to the yield literature is pragmatic: we introduce a dynamic 
factor model with functional coefficients which reconciles the theory-based 
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desire to model yield data as a curve with the applied need of accurately 
forecasting that curve over time. 

The yield curve is a theoretical construct not without its own inherent 
practical difficulties. First and foremost, although yield determines prices, 
only bond prices are observed for a set of discrete maturity horizons; from 
these a corresponding discrete set of yields are calculated. Thus, the yields 
themselves are not directly observed, nor is an entire curve for every possible 
maturity. Further, not only is it of interest to know the yield for all maturities 
at each point in time (cross-sectional), but also for a single maturity as it 
evolves over time (dynamic). Finally, because a bond at time i of maturity t 
is essentially the same bond as the one at time i + 1 of maturity t — 1, 
there is also a certain amount of systematic cross- correlation in yield data. 
Therefore, predictive modeling of bond data needs to consider each of the 
cross-sectional, dynamic and cross-correlational behaviors. 

To this end, yield curve models have traditionally assumed either of two 
formulations. The first is theoretical in nature: as in Hull and White (1990) 
and Heath, Jarrow and Morton (1992), for a given date the emphasis is on 
fitting a yield curve to existing yields based on no-arbitrage principles stem- 
ming from economic theory. The other approach is the so-called equilibrium 
or affine-class models where time series techniques are used to model the 
dynamics of yield on a short-term or instantaneous maturity, and yields for 
longer maturities are then derived using an affine model. This method has 
been developed in works such as Vasicek (1977), Cox, Ingersoll and Ross 
(1985), and Duffie and Kan (1996). 

These contrasting methods illustrate the dichotomy of yield forecast mod- 
els. As a practical matter, goodness of fit is paramount in a model for it to be 
of any use. Still, a yield model should be consistent with its underlying the- 
ory, and maintain a degree of economic interpretation. Cross-sectional/no- 
arbitrage models ignore the dynamics of yields over time [as noted in Diebold 
and Li (2006), Koopman, Mallee and Van der Wei (2010), e.g.] and thus 
threaten the former yet satisfy the latter. Time series/equilibrium models 
place emphasis on the former at the expense of the latter [as seen in Duffee 
(2002)]. 

What we propose in this paper is a synthesis of the cross-sectional and dy- 
namic considerations mentioned above. We approach yield curves as a func- 
tional time series; the yields of the observed maturities are a discrete sam- 
pling from a true underlying yield curve. To this end, we conflate concepts 
from functional data analysis [FDA; Ramsay and Silverman (2002, 2005)] 
and from dynamic factor analysis/modeling [DFM; Basilevsky (1994), e.g.]. 
Ours is a dynamic factor model with functional coefficients which we call 
(not surprisingly) the functional dynamic factor model (FDFM) . These func- 
tional coefficients, or factor loading curves, are natural cubic splines (NCS): 
a significant result which facilitates interpolation of yields both within and 
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out of sample so that forecasts are indeed true yield curves. While the factor 
loadings account for the cross-sectional/curve dimension of yields, the dy- 
namic factors, in turn, determine the evolution of these functions over time. 
Thus, they account for the time series and cross-correlational nature of yield 
data. Our particular specification of the FDFM enables its estimation via 
the Expectation Maximization (EM) algorithm [Dempster, Laird and Rubin 
(1977)]. 

Why the need for both a functional and a dynamic factor framework? 
Recall that the unifying goal is to develop a model that is consistent with 
the concept of the yield curve posited by economic theory and is of use for 
practical forecasting. A naive attempt to merge the latter need with the for- 
mer is to model yields for all observed maturities over time as a multivariate 
time series. However, as the number of observed maturities increases to even 
moderate size, vector autoregressive models (VARs) — for example — become 
intractable because of high dimensionality. 

Abstracting from the yield setting for a moment, in a more general sense 
large multivariate time series have been successfully modeled [Engle and 
Watson (1981), Geweke and Singleton (1981), Molenaar (1985), Peha and 
Box (1987), Peha and Poncela (2004), to name just a few] using a dynamic 
factor approach. In DFMs the multivariate data are assumed to be depen- 
dent on a small set of unobserved dynamic factors. This solves the dimen- 
sionality problem, yet DFMs per se leave to question the interpretability of 
the unobserved factors. Further, in our present context, DFMs fall short of 
producing a functional yield curve. 

To incorporate the functional aspect, we propose to combine the DFM 
framework with ideas from functional data analysis (FDA). However, FDA 
in general is an area still nascent in development, and most applications deal 
primarily with collections of independent curves. Earlier work by Besse, Car- 
dot and Stephenson (2000) applied functional autoregressive models (FAR) 
to univariate climatological data: the seasonal cycle is hypothesized to be 
functional. In a similar hypothesis, Shen (2009) forecasted periodic call vol- 
ume data using a method akin to functional principle component analysis 
(FPCA). In an applied setting more similar to ours, Hyndman and Shang 
(2009) developed a weighted FPCA method to forecast time series of curves 
and applied it to multivariate time series of fertility or mortality data in- 
dexed by different ages. Yet, unlike these models where FPCA and time 
series modeling are performed in separate steps, ours is a method that esti- 
mates both functional and time series components simultaneously, and does 
so in a quite natural manner. 

Within the context of yield curve forecasting, other recent developments 
have begun to reconcile the statistical viability of DFMs and functional 
data analysis with the underlying theory in regard to yield dynamics — 
a constraint which all but requires the usually absent interpretation for the 
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dynamic factors. Diebold and Li (2006) introduced the Dynamic Nelson- 
Siegel model (DNS): a three factor DFM with functional coefficients esti- 
mated in two steps, which extends the original Nelson-Siegel model [Nel- 
son and Siegel (1987)]. The functional coefficients are pre-specified as fixed 
parametric curves and the authors further provide an economic interpre- 
tation of each. Koopman, Mallee and Van der Wei (2010) extended the 
DNS specification to allow (G)ARCH volatility and a fourth dynamic fac- 
tor which allows time dependence to the otherwise fixed parametric factor 
loading curves. Another DFM- type approach is provided by Bowsher and 
Meeks (2008) which present a cointegrated DFM using natural cubic splines 
(NCS). Spline knots serve as dynamic factors following an error correction 
model process; the knot locations are determined via an initial exhaustive 
search-selection procedure prior to model estimation. As noted in Koopman, 
Mallee and Van der Wei (2010), cointegrated factors present a difficulty in 
terms of retaining economic interpretation. 

Presented in this paper is our functional dynamic factor model (FDFM) 
which we show to perform very well in regard to yield curve forecasting. Fur- 
ther, we do so in multiple assessments which highlight the model's capability 
of accurately forecasting the entire function as well as the potential profit 
generated from employing these forecasts in trading strategies. Finally, via 
our online supplement (a brief description follows Section 4), in simulation 
studies we illustrate the accuracy of both FDFM forecasts and predicted 
parameters. In either sense the FDFM outperforms existing models which 
require either multiple-step estimation or lack a functional component. 

It is worth noting our FDFM is in a similar vein as those of the aforemen- 
tioned yield models: a dynamic factor model with functional coefficients; 
one which — quite coincident ally — even exploits the properties of NCS for 
the cross-sectional/curve dimension of yields. However, unlike Diebold and 
Li (2006), the FDFM functional coefficients are estimated; thus, they are 
free to vary with the particular application to explain the functional nature 
of the data. Further, as opposed to the existing two classes of models, esti- 
mation of the FDFM is achieved in a single step. Within the yield context it 
will be seen that the FDFM satisfies our two aforementioned criteria: good- 
ness of fit and economic interpretability. That the factor loading curves are 
estimated facilitates application of the FDFM to contexts outside of yield 
curve forecasting as well. We will show through simulation (online supple- 
ment) that our specification even permits the inclusion of observed nonlatent 
variables in the dynamic factors similar to Diebold, Rudebusch and Aruoba 
(2006). 

The remainder of our paper is organized as follows. In Section 2 we develop 
our model, including discussion of its formulation, details regarding estima- 
tion and significant results in terms of application and utilization. Section 3 
examines in detail the motivating example of real yield data in multiple 
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forecasting and assessment exercises. Finally, we conclude with Section 4 
containing a discussion of our key findings and some directions of future re- 
search. In an online supplement we illustrate simulation results and highlight 
the model's viability for both forecasting and parameter accuracy, especially 
in regard to applications outside of yield curve forecasting. In addition, our 
online supplement [Hays, Shen and Huang (2012)] provides technical proofs 
for the theorem and propositions presented in Section 2. 

2. Functional dynamic factor models. Abstracting for a moment from 
the present setting of yield curve forecasting, consider the more general 
process of a time series of curves {xi(t) :t G T; i = 1, . . . ,n}, where T is some 
continuous interval and i indexes discrete time. It is hypothesized that each 
curve is composed of a forecastable smooth underlying curve, Ui(t), plus an 
error component, £j(t), that is, 

(2.1) x i {t)=y i {t) + £ i {t). 

There are two primary goals of a functional time series model: to provide an 
accurate description of the dynamics of the series, and to accurately forecast 
the smooth curve y n +^(t) for some forecast horizon h > 0. 

In practice, of course, only a discrete sampling of each curve is observed. 
Specifically, consider a sample of discrete points {ti,t2, ■ • ■ ,t m } with tj G T 
for j £ {l,...,m}. The observed data for the ith curve are Xij = Xi(tj), 
j G {l,...,m}. 

2.1. The model. By synthesizing DFM and FDA, we propose a model re- 
ferred to as the functional dynamic factor model (FDFM). The formulation 
is similar to that of a DFM where the observed data {x^} is a function of 
a small set of K latent dynamic factors k = 1, . . . , K} and their corre- 
sponding factor loadings. But in this setting the factor loadings f^j = fk(tj) 
are discrete samples from continuous, unobserved though nonrandom fac- 
tor loading curves /&(•)■ Together, the dynamic factors with their functional 
coefficients generate the forecastable part of the time series of curves {xi(t)}. 

In theory, the dynamic factors can follow any type of time series pro- 
cess such as (V)ARIMA, but for the purpose of this paper we focus on 
factors which are independent, stationary AR(p) processes. Although it is 
not necessary for the number of lags p to be the same for each factor, as 
a matter of notational convenience we simply define p = maxjpi, . . . ,pk} 
and use the appropriate placement of zeros. The factors can include ex- 
planatory variables 3 or just a constant. In the former case, we have a 1 x d 
regressor vector having the d x 1 coefficient vector [i^. Similarly, we let 
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d = maxjdi, . . . , dx }, but do retain the option for the regressors themselves 
to differ among factors; thus, we continue to use the k subscript per fac- 
tor. Finally, for the model to be identified, we require that the functional 
coefficients are orthonormal. 4 The model is explicitly stated as 



with £i(tj) = £ij 7V(0,cr 2 ), v ik N(0,al) and E[v ik £ Vj ] = for = 



1, . . . , n. Should we require only a constant in place of regressors, then An-fik 
is a scalar fi^ for all i. With the assumption of stationarity, this yields the 
constant = — Y2r=i frk)- This is a broad framework that includes the 
standard versions of both DFMs and FPCA models: when the coefficients 
{fk(t)} are nonfunctional, model (2.2) reduces to the standard DFM; when 
the factors {f3 k } are nondynamic, the model is similar to FPCA. 

2.2. Estimation. With the error assumptions for model (2.2), we propose 
estimation via maximum likelihood (ML). To ensure smooth and functional 
estimates for the factor loading curves, we augment the likelihood expres- 
sion with "roughness" penalties [Green and Silverman (1994)] and maximize 
a penalized log-likelihood expression. Because our dynamic factors are unob- 
served, we consider this a problem of missing data, and use the expectation 
maximization (EM) algorithm [Dempster, Laird and Rubin (1977)] to esti- 
mate model parameters and smooth curves. 

2.2.1. Penalized likelihood. Let the n x m matrix X denote collectively 
the observed data where the (i,j)th element of X is Xij for i = l,...,n, 
j = 1, . . . , m. Each row of X corresponds to a yield curve for a fixed date; 
each column represents the time series of yield for a specific maturity. Next, 
we denote fkj = fk(tj), the m x 1 vector f' k = [fki, • • • , fkm]i an d the factor 
loading curve matrix F as 



so that the rows of F are the transposed column vectors f& [this conven- 
tion is to conform with some standard factor analysis matrix notation; see 



(2.2) 




F' = [fi,... 



fid 



4 Other types of constraints may be employed to ensure identification, such as conditions 
on the covariance function of the factor loading curves. 
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Basilevsky (1994), e.g.]. In a similar manner, we define (3 k = [f3±k • • ■ Pnk]' 
and the matrix B nx x = [(3 1 ■ ■ ■ (3 K ]. Thus, the columns of B are the time 
series factors P 1 , . . . , f3 K . Then, the model (2.2) is represented in matrix 
form as 

K 

(2-3) X nxm = Bnx^Fjfxm + £ n xm = ^ ] Pk^'k £ ' 

k=l 

where e = [eij] nxm with = 6i(tj). 

Assuming the matrix of dynamic factors B is observable, the log-likelihood 
expression can be obtained by successive conditioning of the joint distribu- 
tion for X and B: 

(2.4) Z(X,B) = Z(B) + Z(X|B). 

Because we have assumed that the K factors of AR(p) series are indepen- 
dent, their joint distribution is the product of the individual distributions. 
To each of those, we further condition on the first p values of each factor time 
series; thus, our likelihood (2.4) is a conditional one. For ease of notation 
we assume there are no regressors in the factor time series. Then 

K n K . / p \ 2 

(2.5) i(B) = (n- P )j2H^4)+ E E^(^- Cfc -E^ 

k=l i=p+l k=l k \ r=l 

and 



(2.6) /(X|B) = nm ln(27ra 2 ) + ^ E E ( x v ~ E ^fkj 

a i=l j=l V k=l / 

To ensure the underlying factor loading curve /&(•) is smooth, following 
Green and Silverman (1994), we introduce roughness penalties to (2.6) to 
obtain the following penalized log-likelihood: 

l p (X,B) = l(B) + l p (X\B), 

(2.7) 

z(x|B)+EA fc hrmfdt . 

k=l J 

The penalty parameter controls how strictly the roughness penalty is 
enforced, and we allow it to differ for each loading curve (thus the u k" 
subscript). The selection process for the penalty parameters is discussed in 
Section 2.5. We refer to the latter term in equation (2.7), / p (X|B), as the 
penalized sum of squares (PSS). Intuitively, optimization of PSS balances 
a familiar goodness-of-fit criterion with a smoothness requirement for the 
resulting estimates of fk(t)- 

Below we assume the dynamic factors are known and discuss how to 
estimate the AR model parameters and the smooth factor loading curves. 
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When the dynamic factors have no regressors the conditional MLEs for the 
AR parameters ({c^, y>i,fcj . . • , <P Pl k}) are the same as the ordinary least 
squares (OLS) solutions. In the case where the factors do have regressors, 
an additional step is required to alternatively solve for the AR parameters 
{(fi,k, • • • i (fp,k} and the regressor coefficient vectors {/ifc}. The resulting so- 
lutions are the (feasible) generalized least squares (GLS) solution; see Judge 
(1985) for a detailed discussion. We do consider this general formulation in 
the simulation studies reported in our online supplement; a brief discussion 
follows Section 4. 

Now we discuss how to estimate the loading curves ft (t) . In order to allow 
the curves to have their own smoothness, through allowing different A&, we 
proceed in a sequential manner to estimate fk(t) one at a time, incorporating 
penalty parameter selection for that loading curve through cross-validation, 
as discussed in Section 2.5. 

According to Theorem 2.1 of Green and Silverman (1994), for fixed k, 
the minimizer /&(•) of PSS is a natural cubic spline with knot locations 
ti,...,t m . Further, this NCS interpolates the discrete vector which is the 
solution to the minimization problem 

(2.8) mm[Z(X|B) + A fe 4«f fe ], 

ft 

where fi mX m is a matrix determined solely by the spline knot locations; the 
explicit formulation of ft is deferred until Section 2.3. 

Let X = vec(X) which stacks the columns of X into an nm x 1 vector. 
Then using the Kronecker product (g>, model (2.3) can be rewritten in vector 
form as 

K 

X = (F' ® I n )(3 + vec(e) = £(f fc <g> I n )/3 k + vec(e), 



(2.9) 



k=l 



K 



X = ^(/3 fe <g>I m )f fc + vec(£). 



k=l 



The lattermost form facilitates a straightforward derivation of the opti- 
mal factor loading curves. To see this, consider the solution for fixed 
fee {1, . . . , K} = K. For the remaining h £ K, we define X* = X — ^ k (.Ph ® 
I m )f/i. Then the minimization problem (2.8) is equivalent to 

2 



(2.10) 



mm 



1. 



-X* 



1 



a 



(7 



(A 



+ A fc f^f fc , 



where || ■ || is the Euclidean norm. Expanding the first term and differenti- 
ating with respect to yields the solution 



(2.11) 



f - 1 

tfc — — 9 



11/3* 



J -m 
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or a- 2 S(I m /3fc)X* for S = [^-l m + Afcfi]" 1 ; S = S(A fc ). In Section 2.5 



we derive a generalized cross-validation (GCV) procedure for the selection 
of each Afc. 

2.2.2. EM algorithm. In the realistic situation that B is unobservable, 
we treat it as missing data and resort to the EM algorithm for maximizing 
the observed data log-likelihood. First, the EM is inaugurated with initial 
values for the factors and factor loading curves. From these initial values, 
maximum likelihood estimates for the remaining parameters from are 
calculated based on equations (2.5), (2.6) and (2.7); we call this Step 0. Then 
the algorithm alternates between the E-step and the M-step. In the E-step, 
values for the factor time series are calculated as conditional expectations 
given the observed data and current values for the MLEs. In the M-step, 
MLEs are calculated for the factor loading curves and other parameters 
based on the factor scores from the conditional expectations in the E-step. 
After the initial step, the E-step and the M-step are repeated until differences 
in the estimates from one iteration to the next are sufficiently small. More 
details are given below. 

Step 0: Akin to the method used in Shen (2009), initial values for B are 
composed of the first K singular values and left singular vectors from the 
singular value decomposition (SVD) of the data matrix X. Initial values 
for F are the corresponding right singular vectors. From these, initial pa- 
rameter estimates are computed for a 2 and the set of factor parameters 
c fc) fi,k, • • • j Pp,k}- 

The E-step: Derivation of the conditional moments for the E-step requires 
the expressions of some of the unconditional moments. Define the n x n 
variance matrix for /3 k as E&, and let c be the K x 1 vector with elements 
Cfc/[1 — (J2r=i fr,k)]- Then, using equations (2.9), 



Next, using properties of multivariate normal random vectors, the condi- 
tional distribution of /3|X can be found. Let 



(2.12) 



Cov[/3, X] 
Var[X] 



E[(3] 
Var[/3] 



^ / g = C®l„, f?[X]=/i x =(F'®I n )/X j g 

E / 9 = diag{Si,...,Ejf}, 

E /3 ,x = S^(F®I n ), 

E x = (F' ® I n )E^(F I n ) + a 2 I nm . 




Then 



(2.13) 
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From a computational standpoint there is concern over the inversion 
of Sx which is of order nm. Because the EM is an iterative procedure, 
this could be especially problematic. However, we can use the following re- 
sult based on the Sherman-Morrison- Woodbury factorization [Press et al. 
(1992), e.g.] to simplify the computation: 

Proposition 2.1. 
(2.14) S x x = a- 2 l nm - a- 4 (F' ® l n )[a- 2 l nK + E^ 1 ]" 1 ^ ® I n ). 

A derivation of this result is included in our online supplement; the form 
of the result is not so important as what it means. Instead of inverting Sx 
directly, which is an nm x nm matrix, only the middle matrix [a~ 2 I n K + 
Sg 1 ] needs to be inverted. This matrix is of smaller size nK x nK. Further, 

as X!^ is block diagonal, then a~ 2 I n K + S^ 1 is as well. Thus, using this 
factorization, the inversion of Sx is reduced from an nm x nm inversion 
to K, n x n inversions. 

With the conditional moments, the E-step of the EM posits that the 
missing data (the time series factors) are replaced with the known values of 
the conditional distribution given X. Thus, in the following M-step, in solving 
for the MLEs, expressions involving f3 k will utilize values from /x^x, ^/3\x 
and E[f3f3'\X\. 

The M-step: For each EM iteration, the M-step optimizes the condi- 
tional penalized log-likelihood in equation (2.7) given the observed data 
and the current parameter estimates for 0. It is clear from equations (2.5) 
and (2.6) that in the MLEs, the factor time series appear either singly or in 
terms of cross products both within and between factors. Values for terms 
like fiik come directly from the vector /-t^ x- But because a term like fiik'fihk, 
k, k' = 1, . . . , K, i, h = 1, . . . , n, is a conditional expectation of a product, its 
replacement values are obtained from the E[f3(3'\X] matrix. We will show 
in Section 2.5 some rather fortunate results to simplify computation of the 
conditional expectation of the factor products. 

The M-step, then, is just a matter of making these substitutions into the 
likelihood, and solving for the MLEs. After the M-step, we return to the 
E-step to update the values for the factor time series. This procedure is 
repeated until the parameter estimates from one iteration of the EM are 
sufficiently close to those of the next. 

2.3. Connection with natural cubic splines. We now explain the origin 
of the penalty matrix ft from equation (2.8) following Green and Silverman 
(1994). Let hj = tj + \ — tj. For j = 1, . . . ,m, we define the banded matrix 
Qmx(m-2) with columns numbered in a nonstandard way: elements qjji de- 
note the j = 1, . . . , mth row and j' = 2, . . . , (m — l)st column of Q. These 
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elements, in particular, for \j — j'\ < 2, are given by 

(2.15) qj - ld = hj\, qjj = -hj\ - hj 1 , q j+1 j = hj\ 

and are otherwise. Further, we define the symmetric matrix R( m -2)x(rn-2) 
with elements rjj>;j,j' = 2, . . . , (m — 1) such that r,j/ = for \j — j'\ > 2 and 

r n = _ M> for j = 2, . . . ,m - 1, 

r JJ+ i = r J+ ij = \{hj-\ - hj), for j = 2, . . . , m - 2. 

Note that R is diagonal dominant and, thus, it is positive definite and 
invertible. Let 

(2.17) n = QR- 1 Q'. 

The following result is based on Theorem 2.1 of Green and Silverman (1994). 

Proposition 2.2. For fixed k, the /&(■) optimizing PSS in (2.7) is a nat- 
ural cubic spline with knot locations at tj, and 




/> 



[fmrdt ; = f£Of fc . 

A proof of Proposition 2.2 is included in our online Supplement. 

2.4. Forecasting and curve synthesis. Recall that the goal of our Func- 
tional Dynamic Factor Model (FDFM) is to provide forecasts of an entire 
curve from an observed time series of sampled curves. Once the FDFM has 
been estimated, it is a straightforward exercise to do just this. Further, due 
to the functional nature of the model, we are not restricted to forecasts for 
only the observed knot locations; the natural cubic spline (NCS) results of 
Section 2.3 allow us to forecast to any degree of fineness between knot loca- 
tions. Indeed, Proposition 2.2 even allows within sample imputation of an 
entire time series. 

Forecasting is straightforward: for illustrative purposes, suppose we esti- 
mate our FDFM with K factors following an AR(1) process with constants 
{cfc}, k = 1, . . . , K. Then the /i-step ahead forecasted curve x n+ ^ n (t) is based 
on the components of the forecast of the factor time series /3 n +h| nj fc and the 
estimated factor loading curves fk(t): 

K 

■^n+h\n (t) ~'^2^n+h\n,kfk(t), 
k=l 

h-l 

Pn+h\n,k = Cfc + tpkPn+h-l,k = X] ^ ^ k + PkPnk- 

r=0 



(2.18) 
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The NCS result of Section 2.3 ensures that fk(t) is indeed a function 
rather than a discrete set of points. Thus, we can interpolate fk(t) to any 
degree of fineness between any two knot locations tj and tj+i- 

Specifically, consider t € [tj,tj + i];j = l,...,m. We can compute values 
for an entire time series {xi(t)}^ =1 because each fk(t) is an NCS. Denote 
7 fcj - = f'f!(tj). It can be shown [Green and Silverman (1994)] 

(2.19) 



1 + ^)^ 1+ ( 1 + ^P )7fcj 



for each k = 1, . . . , K. For t < ti, or t > t m , the /fc(t) is a linear extrapolation, 
which may or may not perform well depending on whether the linearity 
assumption beyond the boundary knots is suitable for the application of 
interest; we illustrate this limitation in Section 3.3.2. Using this method 
together with equations (2.18), we can just as easily impute and forecast at 
the same time, a result that enables, for example, yield forecasts for bonds 
of maturities that have not been observed. 



2.5. Computational efficiency. This section presents results intended to 
ease some of the computational aspects of the estimation for the functional 
dynamic factor model, the reason for this being that the EM algorithm is 
an iterative procedure and each iteration is rife with large matrix inversions 
and manipulations. Further, given the results of Section 2.2.1, we propose to 
sequentially solve for each factor loading curve f^; k = 1, . . . , K. Finally, the 
smoothing parameter needs to be selected in a data-adaptive manner for 
each k. Below, we present a (generalized) cross-validation (GCV) procedure 
to achieve this. Efficient implementation allows us to easily evaluate the 
GCV score over many candidate values of 

GCV selection: In general, cross-validation is based on sequentially leaving 
out sections of the observed data, estimating a model for each "leave-out" 
and computing some metric for how well the model predicts the left out 
sections. Although a popular method for GCV in FDA is row/curve deletion, 
because the present setting involves a dynamic system of curves, deletion of 
a curve removes an entire time point from the data and destroys the time 
dependency structure. Therefore, here, we pursue a GCV criterion based 
on a leave-out of each series or column. In either sense, it is costly to re- 
estimate the model when each of m columns or n rows of the data X are 
deleted, for each candidate value of Xk and for each k. Fortunately we have 
the following result that obviates re-estimation of the FDFM for each column 
leave-out: 
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Theorem 2.1. Let X* = X - Y^h^k^A- Then the GCV criterion for 
each X k based on column deletion is explicitly expressed by components of 
estimation on the complete data: 

(I m -\\f3 k \\ 2 /a 2 S)(X*y(3 k \\ 2 /m 



(2.20) GCV(A fc ) 



[l-tr(\\/3 k \\*/<T*S)/m]* 



The proof of Theorem 2.1 is found in our online supplement. GCV(Afc) 
is calculated over a grid of possible values during the M-step of each EM 
iteration for each factor loading curve. The smoothing parameter that cor- 
responds to the least value of GCV(-) is selected as the optimal one. It 
is worthwhile to note that this can be a computationally intensive proce- 
dure: calculating GCV(A) for several values for A during each EM iteration 
and for each factor. Criterion (2.20) depends on the inversion of the matrix 
g-i _ [ H^fcll \ m _j- Afcfi] . Using the eigen-decomposition of fi, a method exists 
for which the only inversion required is the inversion of a diagonal matrix. 
Consider the following proposition, the derivation of which is included in 
our online supplement: 

Proposition 2.3. Given the eigen-decomposition of the mxm penalty 
matrix ft = TA.T' with A roxm = diagj^j}™^, then 

S(A fc )=r-diagj(M^ + V ^ V 



and 



tr{S(A fc )} = E 



J \\(3 k \\ 2 /° 2 + AfcV 

Thus, a single eigen-decomposition, followed by a diagonal matrix inver- 
sion for each of the factors, circumvents performing an m x m inversion for 
each of the K factors and each of the candidate values for Afc. 

Block diagonality: In the M-step, when products of the factors appear, 
such as (f3 k ,(3 h ) = E[{/3 k ,P h )\'X], then the imputation comes from the E\j3j3 |X] 
matrix. It can be shown that ^p\x 1S block diagonal; this property facili- 
tates a rather convenient result regarding between-factor cross products (the 
derivation of this result is found in our online supplement). 

Proposition 2.4. £/3|x is block diagonal with K nxn blocks. Further, 
forh^k, E[(p k ,p h )\X] = (j*p h \x, t*p h \x) . 

Therefore, the conditional expectation of a product of two (distinct) fac- 
tors is simply the product of their individual expectations. This greatly 
simplifies the M-step calculations. 
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3. Application to yield curve data. 

3.1. Yield curve data. In this section we consider the application of our 
functional dynamic factor model to actual yield data. We use the same data 
set as Diebold and Li (2006) which consists of a sample of monthly yields 
on zero coupon bonds of eighteen different maturities (in months): 

1.5, 3, 6, 9, 12, 15, 18, 21, 24, 30, 36, 48, 60, 72, 84, 96, 108, 120, 

from the period January 1985 through December 2000 (192 months), orig- 
inally obtained from forward rates provided by the Center for Research in 
Securities Prices (CRSP), then converted to unsmoothed Fama-Bliss yield 
rates [see Fama and Bliss (1987) for the conversion methodology]. 

3.2. The dynamic Nelson-Siegel model (DNS). In the following sections 
we compare the FDFM with the DNS model presented in Diebold and Li 
(2006). Their model is composed of three factors with corresponding factor 
loading curves. The factor loadings are pre-specified parametric curves (see 
the dashed curves in Figure 1) based on financial economic theory. Let Xiit) 
denote the yield at date i on a zero coupon bond of maturity t, then the 
DNS model is represented as 



(3.1) 



%i(t) = ^2Pi,kfk(t) + £i(t) fori = l, 

k=l 

h(t) = l, / 2 ( t )^- eXP( -^ 



,n, 



f 3 (t) = f 2 (2) -exp(-ait), 

. Pi,k = c fc + <PkPi-i,k + d,k for k = 1, 2, 3, 

evaluated at maturities tj, j = 1, . . . ,m. The first loading curve fi(t) is con- 
stant and intended to represent the long-term component of yields (level); 
the second fzif) represents a short-term component, or slope. Finally, the 
third loading fs(t) represents a mid-term component, or curvature. The pa- 
rameter on determines the point t*(at) at which /a(t) achieves its maximum. 
While this can be estimated as a fourth factor [see, e.g., Koopman, Mallee 
and Van der Wei (2010)], Diebold and Li (2006) set Oj to a fixed value for all 
i = 1, . .. , n. This results in entirely predetermined, parametric curves. The 
specific value a = 0.0609 is determined by their definition of "mid-term" as 
t = 30 months. 

Estimation of the DNS model is a two-step procedure. First, time series of 
factor scores of /3i t k are estimated by ordinary least squares (OLS) of Xi(tj) 
on [1, f2(tj), /^(tj)] for j = 1, ... ,m at each time point i = 1, . . . , n. Second, 
an AR(1) model is fit on each series jH^k for the purpose of forecasting j3 n+ \^ 
and ultimately x n+ \(tj) via equation (2.18) from Section 2.4. 
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(a) First Factor Loading Curve (b) Second Factor Loading Curve 
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(c) Third Factor Loading Curve 
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Fig. 1. Example of factor loading curves: FDFM curves (solid, left axis) estimated from 
the period May 1985 to April 1994; pre-specified DNS curves (dashed, right axis). FDFM 
estimates closely resemble the shape of the DNS curves for the second and third factors, 
while the first FDFM factor loading curve resembles a typical yield curve shape. Dual axes 
are used to account for difference in scale: FDFM is represented on the left axis; DNS on 
the right axis. 



3.3. Assessment. We assess the performance of the FDFM in three dis- 
tinct exercises. The first two are traditional error based assessments of fore- 
casts or within-sample predictions of yield curves or sections thereof. The 
final application is a combination of both forecasting and curve synthesis. 
Through an adaptation of the trading algorithms introduced in Bowsher and 
Meeks (2008), we develop trading strategies based on the forecasts of the 
FDFM and DNS models and assess the resulting profit generated by each. 

For each of these, as a comparison, we use the DNS specification aforemen- 
tioned above in Section 3.2. For the purpose of making an unbiased compar- 
ison, we use a similar formulation of our FDFM model with 3 factors follow- 
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ing independent AR(1) processes. The key distinction between this FDFM 
model and the DNS model is that the FDFM estimates the model simul- 
taneously: the smooth factor loading curves and the AR(1) parameters are 
estimated in a single step. In contrast, the estimation for the DNS model re- 
quires two steps given the pre-specified factor loading curves: first the factor 
time series are estimated; from these the AR(1) parameters are determined. 

The key distinction between the two models raises an interesting ques- 
tion: How do the factor loading curves between the two models compare? 
Figure 1, panels (a)-(c), show an example of the factor loading curves esti- 
mated by the FDFM (solid line) for the period May 1985 through April 1994. 
Pictured alongside, the dashed line plots the DNS model curves. Recall the 
DNS motivation for the form of fi, fi and was an economic argument, 
while the formulation of the FDFM described in Section 2 is based entirely 
on statistical considerations. Despite this, we see that the FDFM model is 
flexible enough to adapt to a specific application. Factor loading curves f2(t) 
and /3 (t) from the FDFM assume the behavior of those from the DNS model 
without imposing any constraints that would force this. Thus, the FDFM in- 
herits the economic interpretation of f2(t) and /3(f) set forth in Diebold and 
Li (2006). In the case of fi(t), the FDFM version resembles a typical yield 
curve shape as opposed to a constant value for DNS; however, inspecting the 
magnitude suggests that departure of the FDFM version from a constant 
value is small. Less typical yield curve shapes are usually characterized by 
deviations in the short and mid-term yields from the norm. This is exactly 
what f2(t), fs(t) and their corresponding scores capture. Thus, we consider 
the first factor as the mean yield, while the second and third account for 
short and mid-term deviations from this norm. 

3.3.1. Forecast error assessment. In this section we compare the FDFM 
and DNS models using a rolling window of 108 months to forecast the yield 
curve 1, 6 or 12 months ahead. For example, for the one month ahead forecast 
we fit the models on the first 108 months of data and forecast the 109th 
month, then fit the models on the 2nd through 109th month and forecast 
the 110th month, etc. Yields on bonds of maturity less than three months 
are omitted in order to match the methodology used in Diebold and Li 
(2006). To compare the models, we use the mean forecast error (MFE), root 
mean squared forecast error (RMSFE) and mean absolute percentage error 



(MAPE): 
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Table 1 

MFE, RMSFE and MAPE: 1, 6 and 12 month ahead yield curve forecast results. The 
better result between the two models is highlighted in bold. For 1 month ahead forecasts, 
the FDFM results in lower (magnitude) MFE for most maturities, but results are mixed 
for 6 and 12 months ahead. RMSFE and MAPE is typically lower with the FDFM for 1, 

6 and 12 months ahead 





1 month ahead 


6 months 


12 months 


Maturity 


DNS 


FDFM 


DNS 


FDFM 


DNS 


FDFM 








MFE 








3 months 


-0.045 


0.026 


0.123 


0.172 


0.203 


0.257 


1 year 


0.023 


0.035 


0.177 


0.168 


0.229 


0.215 


3 years 


-0.056 


0.015 


0.022 


0.060 


0.003 


0.013 


5 years 


-0.091 


-0.004 


-0.079 


-0.021 


-0.166 


-0.133 


10 years 


-0.062 


-0.023 


-0.139 

RMSFE 


-0.121 


-0.316 


-0.318 


3 months 


0.176 


0.164 


0.526 


0.535 


0.897 


0.867 


1 year 


0.236 


0.233 


0.703 


0.727 


0.998 


0.967 


3 years 


0.279 


0.274 


0.784 


0.775 


1.041 


0.947 


5 years 


0.292 


0.277 


0.799 


0.772 


1.078 


0.953 


10 years 


0.260 


0.250 


0.714 

MAPE 


0.697 


1.018 


0.921 


3 months 


2.58 


2.50 


8.21 


8.11 


12.99 


12.05 


1 year 


3.37 


3.30 


10.25 


10.29 


12.70 


12.08 


3 years 


3.79 


3.77 


11.68 


11.33 


14.16 


12.71 


5 years 


3.88 


3.81 


11.94 


11.42 


14.93 


13.41 


10 years 


3.24 


3.24 


10.49 


9.99 


14.22 


13.22 



MAPE ■ = — \x n +h(tj) - X n+h {tj)\ 

3 r ~[ X n+h (tj) 

where r = 84, 79, 73 is the number of rolling forecasts for forecast horizon 
h = 1, 6, 12, respectively. 

A summary of the forecasting performance is shown in Table 1. For month 
ahead forecasts, the MFE is lower (in magnitude) with the FDFM for four 
out of the five displayed maturities (highlighted in bold), while RMSFE is 
lower for all five. For six months ahead, DNS outperforms FDFM just 2 out of 
five times in both MFE and RMSFE. For twelve month ahead forecasts, DNS 
outperforms FDFM in MFE for 3 of 5 displayed maturities. However, FDFM 
has lower RMSE for all 5 maturities. In terms of MAPE, the FDFM exhibits 
lower MAPE than DNS nearly uniformly for the displayed maturities and 
for 1, 6 and 12 month ahead forecasts. 



3.3.2. Curve synthesis. Because each factor loading curve /&(•) is an 
NCS, between any two observed maturities tj and ti+i, we can calculate 
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Fig. 2. Example of curve synthesis: Entire time series of yields are omitted from esti- 
mation, then "filled in" using the imputation described in Section 2.4- Here, 3 consecutive 
maturities have been omitted, resulting in 3 missing time series corresponding to these 
maturities. 

the value for fk(t). It follows, then, that between any two time series of 
yields {xi(tj)}f =1 and {^i(tj+i)}f =1 , we are able to replicate an entire time 
series for the intermediate maturity t: {xi(t)}™ =1 . 

To illustrate this point, we use the entire data set (see introduction of 
Section 3), that is, use i = 1, . . . , n = 192 months of yield data for maturi- 
ties tj, m = 18. For both the DNS and FDFM models, we delete a set of 
adjacent time series from the data, estimate the model, then assess the pre- 
diction error of the predicted series in reference to the actual deleted series. 
Specifically, for our data matrix X nxm with columns xi,...,x m , we omit 
I = 1, . . . ,L consecutive columns from X, then estimate the model on the 
remaining Q = m — L maturities. From this we compute the L time series 
of missing data: x^, . . . ,xj + l; an example for the case where L = 3 is shown 
in Figure 2. For each choice of L, we delete a "horizontally" rolling window 
of width L maturities and estimate the model on the remaining Q maturi- 
ties, R = m — L + 1 times. As an example, for L = 3, we can estimate the 
models on X4,...,x m and predict xi,...,X3; then estimate the models on 
xi,X5, . . . ,x m and predict X2, . . . ,X4, etc. 

We examine the RMSFE for the Zth omitted maturity of the rth sequence; 
r = 1, . . . , R; I = 1, . . . , L. Because the models are estimated based on a rolling 
window of maturities, for each choice of L a time series Xj for yield tj will be 
estimated multiple times. Therefore, for each choice of L we take the mean of 
the RMSFE of the predicted series for each maturity. We further average over 
our definitions of short (t S [1.5,21)), mid (t £ [21,36]) and long-term (t € 
(36,120]) horizons. Finally, we average over all maturities as a one-number 
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Table 2 

Average RMSFE; FDFM as a fraction of DNS: (a) with extrapolation (b) without 

extrapolation 



Omitted 




(a) With 


extrapolation 




(b) Without 


extrapolation 




Short 


Mid 


Long 


All 


Short 


Mid 


Long 


All 


1 


0.88 


0.97 


1.05 


0.95 


0.84 


0.97 


1.04 


0.94 


2 


0.95 


0.90 


1.13 


1.00 


0.90 


0.90 


1.01 


0.94 


3 


0.94 


0.98 


1.06 


0.98 


1.00 


0.98 


1.00 


0.99 


4 


0.87 


0.93 


1.64 


1.14 


0.99 


0.93 


1.07 


1.01 


5 


0.99 


1.01 


0.88 


0.95 


1.07 


1.01 


0.99 


1.03 


6 


0.99 


1.00 


1.67 


1.26 


1.05 


1.00 


1.20 


1.09 


7 


1.34 


1.11 


0.93 


1.13 


1.24 


1.11 


1.16 


1.19 


8 


0.92 


1.17 


1.82 


1.39 


1.30 


1.18 


1.48 


1.33 



summary. These results are presented in Table 2 with FDFM as a fraction of 
DNS. Because prediction for the FDFM model outside the range of the data 
is linear extrapolation, 5 we expect these to become increasingly inaccurate 
as L grows large. Thus, results are also presented excluding extrapolated 
predictions in order to better illustrate the truly functional predictions of 
the FDFM. 

In general, as L increases from 1 to 8 we see the expected decline in the 
performance of the FDFM model relative to DNS. In panel (a) of Table 2 the 
average RMSFE on short-term bonds for the FDFM remains surprisingly 
robust as we delete more and more maturities. On mid-term bonds, DNS 
results in lower prediction error when the number of deleted series reaches 5 
or more. For long term, DNS more or less outperforms FDFM across the 
board (this trend will be echoed in Section 3.3.3). These results are simi- 
lar whether or not the extrapolated results are included. Perhaps the best 
summary is the last column in each of panel (a) and (b) of Table 2, where, 
beyond 3 or 4 omitted maturities, the parametric based DNS model begins 
to outperform the FDFM. 

3.3.3. Portfolio-based assessment. RMSFE-type assessment provides 
a good diagnostic measure of forecast performance from a statistical per- 
spective. However, as Bowsher and Meeks (2008) argued in their paper, 
in applied economic settings, a pure error-based assessment measure may 
fail to fully explain the financial implications of having used a particular 
model. Therefore, in this section we consider an adaptation of the profit 
based assessment introduced therein. By using modified versions of their 
three trading strategies, we create portfolios based on the model forecasts, 



5 This is due to the NCS framework; see Section 2.3 for details. 
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then measure the cumulative profit of the strategy. This also serves as a good 
capstone exercise for our presentation of the FDFM, as it simultaneously in- 
volves both forecasting and curve synthesis: the primary uses for our model. 

In each strategy we use the same rolling window of 108 months as de- 
scribed in Section 3.3.1 so that the trading algorithm is employed every 
month over the course of 84 months. Each period i we create a portfolio 
consisting of a $1M purchase of one bond or set of bonds and a correspond- 
ing sale of another bond or set of bonds for the same amount. Therefore, 
the net investment per period is $0. The decision of which bond to sell and 
which to buy is made based on the sign of the predicted spread in their one 
period returns. 

At time i + 1 we cash out our portfolio and record the cumulative profit 
over the 84 month trading period. Denoting the yield at time i of a zero 
coupon bond of maturity t months as Xi(t), the price of the bond at time i 
is 

(3.2) Pi(t) = exp[-tXi(t)]. 

Correspondingly, the price the next period (month) is then Pi + \(t — 1) = 
exp[— (t — l)xj+i(i — 1)] since in the month that has elapsed the maturity is 
reduced by, not surprisingly, one month. We denote the one period return 

as 

'Pi+i(t-l) 



(3.3) Ri+i(t) 



1. 



and the log one period return as ri + \{t) = ln[l + Ri + i(t)}. Equations (3.2) 
and (3.3) imply 

(3.4) r i+ i(t) = txi{t) - (t - l)xi +1 (t - 1). 

Thus, for a forecasted yield x i+ iu(t) we have r i+1 ij(t) = txi(t) — (t — l)x i+1 ^(t- 
1), which is a combination of both actual and forecasted yields. We use the 
data presented in Section 3.1 and thus are limited to a set of nonconsecutive 
observed maturities. Akin to Bowsher and Meeks (2008), we rely on linear 
interpolation of Xi(t — 1) to provide the yield for Xi(t) and use the same 
random walk forecast (RW) as a benchmark by which to compare models: 

(3.5) x i+1 (t) = x i (t)+r li+1 (t), »*+i(*) L ~" WN(0,v 2 ), 
with forecast x i+ m(t) = Xi(t). 

Algorithm 1. For this strategy, we adopt the method used in the sec- 
ond algorithm presented in Bowsher and Meeks (2008). Ours differs slightly 
since the data we use (introduced in Section 3.1) does not contain the two 
month maturity. Let T = {4, 5, . . . , 13, 16, . . . , 85}, t\ = 4 and t%j £ T \ {4}; 
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Table 3 

Algorithm 1; Weighted pairs. Use of the FDFM model results in nearly twice the 
cumulative profit produced from the DNS model 





Profit (x$1000) 




Directional accuracy of sub-portfolios 




Percentile 


+ 


Model 


Cumulative Median 10th 


90th 




FDFM 

DNS 

RW 


1089 5.06 -101.92 
519 5.07 -110.77 
-94 -10.52 -190.5 


149.53 
116.02 
163.64 


1102/1520 (72.5%) 392/1252 (31.3%) 
926/1520 (60.9%) 538/1252 (43%) 
1274/1520 (83.8%) 97/1252 (7.7%) 



j = 1, . . . ,33. Every period i we form a portfolio of sub-portfolios with two 
bonds {ii,t2,j}- Define weights Wj as the proportion of the historical abso- 
lute excess return on portfolio {ti,t2j} to the sum over all j of the same: 

= T,i\Ri{t2,j) - Ri{ti)\ 

where i spans the period January 1985 to December 1993. 

To borrow some notation from Bowsher and Meeks (2008), let dij rep- 
resent the investment rule for the amount at time i invested in each jth 
sub-portfolio. To determine the amount invested in each sub-portfolio, let 

dij = $1M x wj x sgn[r i+1 | i (t 2 j) - r i+1 |j(ti)]. 

We set dij = in the off chance of rj +1 |j(t2j) = Let 7Tj + i denote 

the time i + 1 profit resulting from these rules. Then 

vTi+i = y^dij[Ri + i(t 2 j) - R i+ i(ti)} ~y^<4?[ri + i(t 2 j) - r i+ i(*i)]. 
j j 

The results of this trading strategy are summarized in Table 3. Use of the 
FDFM model results in nearly twice the cumulative profit produced from 
the DNS model. Also shown is the capability of each model in successfully 
predicting the positive (1520) and negative (1252) actual spreads of the 
sub-portfolios in each period. Surprisingly, the random walk model has the 
greatest accuracy in predicting positive spreads (84%), as compared to the 
FDFM (73%) and DNS (61%) models. All three models are less accurate in 
the prediction of a negative spread, though RW is the worst by far (8%). 

Algorithm 2. The strategy in Algorithm 1 is a fairly basic one: to 
use every available bond at our disposal to predict the spread between its 
return and a short-term bond. Our second strategy is more sophisticated by 
creating portfolios of an optimal pair of bonds each period i. Given a fixed 
value of t\, we choose tn to optimize the absolute spread in predicted return 
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each trading period: 

(3.6) i 2 i = argmax|f i+1 u(i) - r i+ i|i(*i)l- 

This is an adaption of the third algorithm presented in Bowsher and 
Meeks (2008). There, in a single exercise, the authors fix t± = 3 and select t 2 i 
according to equation (3.6) each trading period. Here, we examine multiple 
choices for t\ and determine t 2 % according to equation (3.6) each trading 
period for each choice of fixed t\. Because we examine multiple portfolios, 
we use a sparser set of maturities in this exercise than previously, though of 
the same range. This set is defined by the observed maturities of Section 3.1: 

t u t 2i € T = {4, 7, 10, 13, 16, 19, 22, 25, 31, 37, 49, 61, 73, 85}. 

We perform this exercise for all choices of t\ and t 2 i as long as t\ < t 2 i, 
and compare the results. Our investment rule di at time i and resulting 
profit 7Tj_|_i the next period is of a similar form to Algorithm 1: 

di = $1M x sgn[r i+1 |j(t 2 i) - f i+ i\i{h)], 

■K i+1 = di[R i+1 (t 2 i) - R i+ i(h)] ss di[r i+1 (t 2 i) - r i+1 (ti)\. 

Again, we set di = whenever r i+1 ^(t 2 i) = ^+i|j(ii)- 

The results of the strategy are shown in Table 4. When the choice of t\ 
is six months or less, the DNS model generates greater cumulative profit 
than either of the other models. However, when the choice of t\ is within 9 
and 36 months, the FDFM consistently generates significantly greater profit 
than the DNS and RW models. Thus, when we are free to pick the bond 
that optimizes the predicted spread each period, the FDFM performs rather 
well, provided the maturity of the first bond is within a certain range. Our 
final strategy expands upon this idea. 

Algorithm 3. Because the choice of the optimal second bond can vary 
from one period to the next in Algorithm 2, it is not clear what a consistently 



Table 4 

Algorithm 2: Optimal pairs portfolio 
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Profit (x$1000) 
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Profit (x$1000) 
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RW 


FDFM 
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3 


1013 


3574 


-228 


Mid 


21 


1246 


202 


680 




6 


1381 


2828 


-133 




24 


1592 


242 


70 




9 


1061 


1013 


-297 




30 


2284 


203 


-951 




12 


1873 


-367 


-307 




36 


1466 


919 


-173 




15 


1519 


-582 


-432 


Long 


48 


-361 


589 


236 




18 


1081 


-481 


-263 




60 


740 


339 


-284 














72 


-131 


-1 


72 



FUNCTIONAL DYNAMIC FACTOR MODELS 



23 



good combination is. Thus, for our third strategy we consider an exploratory 
and exhaustive approach as a diagnostic assessment of with which combi- 
nation of bonds our model excels. As such, we expand our set of bonds to 
include those of longer maturity: 

tt,t 2 eT = {4, 7, 10, 13, 16, 19, 22, 25, 31, 37, 49, 61, 73, 85, 97, 109}. 

In this modification of strategy 1 from Bowsher and Meeks (2008), the 
portfolio is a simple one consisting of two bonds with maturities t± and t 2 . 
For the duration of the strategy, these maturities remain fixed over all pe- 
riods i = 1, . . . ,84. As before, the decision at time i of which bond to sell 
and which to buy is made based on the predicted direction of the spread in 
log one period returns: di = $1M x sgn[r i+1 |j(t2) — ^i+i|i(ii)] [we set di = 
whenever r i+ i^(t 2 ) — This yields the time i + 1 profit 

= di[R i+ i(t 2 ) - Ri + i(ti)] w di[r i+ i(t 2 ) - rj+i(ii)]. 



7T, 



We examine the cumulative profit of all combinations of this type or portfolio 
such that t 2 > t\. 

Figure 3 depicts the results of our final trading strategy. For each com- 
bination of t 2 > t\, the name of model with the largest cumulative profit 
is displayed in that cell by the first initial of its acronym ("F" for FDFM, 
e.g.). A "+" or "— " suffix indicates the largest profit was positive or nega- 
tive, respectively. 

The FDFM model typically has the greatest profit when t 2 € {30, . . . , 72}. 
These results are consistent with Sections 3.3.1 and 3.3.2: the FDFM was 
either comparable or better on RMSFE for forecasting and for imputation 




FlG. 3. Algorithm 3: All combinations of portfolios for £2 > ti- The model with the largest 
cumulative profit is displayed by the first initial of its acronym with "+ " or "— " indicating 
positive or negative profit. 
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on maturities in this range. We also see a certain similarity in these results 
to those of Algorithm 2. Namely, that the FDFM typically outperformed 
the other two models when t\ was exactly in this range. 

For the longest maturities (> 72), the DNS model results in greater profit 
when t\ < 48. Results for other regions are mixed. Recall from Section 3.1 
that in our data short and mid-term yields are typically spaced either 3 or 6 
months apart, whereas long-term maturities are spaced 12 months apart. 
As we saw in Section 3.3.2, as the spacing between maturities increased, 
the FDFM model eventually broke down; it is, after all, very much a data 
driven model. DNS, on the other hand, maintains the same factor loading 
curves regardless of the data, which could explain its greater profits at long 
maturities. 

4. Conclusion and discussion. In this paper we developed a method for 
modeling and forecasting functional time series. This novel approach synthe- 
sizes concepts from functional data analysis and dynamic factor modeling 
culminating in a functional dynamic factor model. By specifying error as- 
sumptions and smoothness conditions for functional coefficients, estimation 
by the Expectation Maximization algorithm results in nonparametric factor 
loading curves that are natural cubic splines. Thus, for a given time series 
of curves we can forecast entire curves as opposed to a discrete multivariate 
time series. 

The motivating application is yield curve forecasting, where existing ap- 
proaches typically exhibit a trade-off of consistency-with-economic-theory 
and goodness of fit. However, through multiple forecasting exercises we show 
that our model satisfies both of these criteria. A further online supplement 
underscores these results and also showcases the model's viability to settings 
well outside of economics and yield curve forecasting and where a prior the- 
ory does not exist. Indeed, this exciting new class of models is fertile for 
further development and application. 

The present paper focuses on yields of zero coupon bonds. A particu- 
larly interesting direction for future research is the extension of our mod- 
eling framework to yields implied by commodities that include convenience 
factors. For example, see Casassus and Collin-Dufresne (2005) and Chua 
et al. (2008). A potential difficulty in this regard is the consistent control of 
multiple parameters: the commodity, the maturity and the liquidity of said 
maturity, for example. Another interesting direction of research is to develop 
nonlinear time series models for functional data; existing approaches for non- 
linear univariate time series modeling [see Fan and Yao (2003), Section 1.5.4] 
may be helpful for that purpose. 

Acknowledgments. The authors would like to express their sincere grat- 
itude to the Editor, Associate Editor and reviewers whose comments have 
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SUPPLEMENTARY MATERIAL 

Simulation studies and technical proofs (DOI: 10. 1214/12- AOAS551SUPP ; 
.pdf). The online supplement contains the following: (1) additional simula- 
tion studies to further illustrate the advantages of our method; (2) detailed 
proofs of Theorem 2.1 and Propositions 2.1-2.4. 
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